370 - vm_clone produces VMs that are silently fragile to multi-disk attach#371
Conversation
6dbb575 to
689fee5
Compare
There was a problem hiding this comment.
Pull request overview
This PR aims to address Issue-370 where VMs created via vm_clone can become fragile when additional disks are attached, by ensuring a boot device order is set after cloning. It also introduces a helper for identifying a “primary” disk and modifies the module’s cloud-init defaults.
Changes:
- After a successful clone, fetch the cloned VM and attempt to set its boot device order.
- Add
VM.get_primary_disk()helper to select a primary disk. - Set a default
cloud_init.user_dataforvm_clone.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| plugins/modules/vm_clone.py | Adds post-clone boot order setting logic; changes cloud-init argument defaults. |
| plugins/module_utils/vm.py | Fixes a disk comment typo and adds a helper to select a “primary” disk. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
acc3fdc to
67eda0b
Compare
26cb659 to
06d8f76
Compare
- cloud init's user data not has a default value
- boot devices by default set as the primary Virtio disk
34a4622 to
fff6ef4
Compare
2d57110 to
f2fa4b5
Compare
e4dcabc to
6f42ae3
Compare
justinc1
left a comment
There was a problem hiding this comment.
I think PR solves exactly the problem described in issue (e.g. virtio disk etc), but also creates additional corner cases.
I think we need to investigate what can be done. What if sorce VM has IDE or SCSI disk, a mixture, a NIC boot device etc.
I'm not sure we will be able to fully solve the problem. I tried to create a simple emtpy VM, with 1 disk, in HC3 web UI. The VM got beside the disk also a NIC, (empty) cloud-init CD-ROM, and scale-guest-tools-4.2.iso CD-ROM. The disk and (empty) cloud-init CD-ROM were set as boot devices by HC3. In this case it seems like source VM boot order could be used as source of truth. But I did not check all potential ways to create a source VM.
|
|
||
| user_data = cloud_init.get("user_data") | ||
| if not user_data: | ||
| return cloud_init |
There was a problem hiding this comment.
No need to fix grub config in this case?
There was a problem hiding this comment.
I'm not 100% sure, this is from the task description: "If the module accepts cloud_init.user_data, the generated cloud-init should include a runcmd to fix the grub UUID issue at first boot".
I read it as "if the the user_data was passed"
| user_data.rstrip() | ||
| + """ | ||
|
|
||
| runcmd: |
There was a problem hiding this comment.
We might end adding a user_data section when we already have a user_data section.
We can add a unit tests for those corner cases, then refaction this function a bit.
Corner cases:
- no user_data
- emtpy user_data
- no user_data.runcmd
- emtpy user_data.runcmd
- user_data.runcmd with entries
I guess we want to always comment out GRUB_DISABLE_LINUX_UUID ?
| hypercore_tags.append(tag) | ||
| data["template"]["tags"] = ",".join(hypercore_tags) | ||
| if cloud_init: | ||
| cloud_init = cls.clone_add_user_data_to_cloud_init(cloud_init) |
There was a problem hiding this comment.
I have impression we want to alway run this?
| - source_info.records.0.nics.0.mac != cloned_info.records.0.nics.0.mac | ||
| - source_info.records.0.node_affinity == cloned_info.records.0.node_affinity | ||
| # Cloned VM's boot devices should be set | ||
| - cloned_info.records.0.boot_devices | length != 0 |
There was a problem hiding this comment.
I would prefer to test with "==".
I guess source_info.records.0.boot_devices | length == 0?
And cloned_info.records.0.boot_devices | length was 0 before, 1 after PR?
| return disk | ||
|
|
||
| # primary disk is the largest Virtio disk | ||
| def get_primary_disk(self): |
There was a problem hiding this comment.
We could clone a VM with IDE or SCSI disk(s).
Not sure why is the largest disk the bootable one. VM with 100 kB cloud-init ISO, 20 GB OS disk, 100 GB installed-apps disk. Maybe the first disk on bus is the OS disk, but if we add/remove disk, we likely can change that too. :( :(
Ideally we could ask HC3 about this setting for the source VM. But source VM might never be booted before clonning.
Or, we could clone VM, boot it, then assuming cloned VM does boot (really boot, not just wait on
"no OS installed") set this from cloned VM runtime state.
A NIC could be a boot device too.
There was a problem hiding this comment.
a cloud-image VM cloned via this module boots correctly for years while it has exactly one virtio disk, then bricks on the first reboot after any tool (CSI driver, manual disk-add, Terraform provider, or even another playbook in the same suite) attaches a second virtio disk.
From my understanding this issue is a corner case to solve the "one virtio disk already exists and is used for boot but isn't specified as one and causes issues later on when another disk is added"
Boot order is now set after cloning to avoid issues when attaching additional disk to VM afterwards.
Cloud init's user datadefaultruncmdis also added ifcloud inithas been passed as parameter.